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Abstract. We establish a log-supermodularity property for probability dis- 
tributions on binary patterns observed at the tips of a tree that are generated 
under any 2— state Markov process. We illustrate the applicability of this re- 
sult in phylogcnctics by deriving an inequality relevant to estimating expected 
future phylogenetic diversity under a model of species extinction. In a further 
application of the log-supermodularity property, we derive a purely combina- 
torial inequality for the parsimony score of a binary character. The proofs of 
our results exploit two classical theorems in the combinatorics of finite sets. 



1. Introduction 

Finite-state Markov processes on trees are widely used in evolutionary biology to 
model the way in which discrete characteristics of present-day species have evolved 
from the state present in some common ancestor [51 [T2]. In this paper, we investigate 
a generic inequality that applies to 2-state Markov processes on trees, and provide 
two applications. 

The first application, which was the motivation for our study, is to the theory 
of biodiversity conservation. We consider the expected loss of 'phylogenetic di- 
versity' under a model in which extinction risk is associated with an underlying 
state that evolves on the tree. We are interested in comparing this expected loss to 
simpler models in which extinction events are treated independently; we find that 
that when extinction events reflect phylogenetic history, then the expected loss of 
phylogenetic diversity is always greater than or equal to that predicted by an in- 
dependent extinction scenario. Essentially, this is because the probability that an 
entire 'clade' (the set of present-day species descended from a vertex in the tree) 
becomes extinct is higher when the evolution of the influencing state is taken into 
consideration than when we treat extinctions as independent events. 

In a second application, we derive a new, purely combinatorial result concerning 
the 'parsimony score' of a binary character on a tree. We also briefly discuss 
how the generic inequality for 2-state Markov processes relates to recent work on 
phylogenetic invariants and inequalities for particular submodels. 
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2. Markov processes on trees 



Let T = (Vt,Et) be a tree with leaf set X. Consider a Markov random field 
on T with state space {0, 1}, and for each vertex v of T, let be the random 
state (0 or 1) that v is assigned. This process is usually described as follows. We 
have a root vertex p for which we specify a probability, say 71",;, that = i, for 
i £ {0, 1}. Direct all the edges of T away from p and for any arc (r, s) of the 
resulting directed tree T = (Vt,At), let p( r > s ) denote the 2x2 transition matrix 
for which the ?j-entry (for i, j £ {0, 1}) is the conditional probability that £(s) = j 
given that £(r) = i. Specifying tt — [ttq,tti] together with the transition matrices 
p(r,s) £ or & yi the arcs (r, s) of T uniquely defines the Markov random field on T 
(see, for example, [31 HH E]); an ex P nc it formula appears below (Eqn. [1]). We will 
assume throughout that 7r is strictly positive and that the following condition holds 
on each of the transition matrices: 

detP (r < s) > 0. 

Notice that this determinant condition automatically holds if one views the tran- 
sition matrix for an arc as describing the net effect of a continuous-time Markov 
process operating for some duration for that arc. Note however that we are not 
assuming that any such process is the same between the arcs of T (i.e. the model 
is not necessarily stationary). 

For U C Vt let P(U) denote the probability that U is precisely the set of vertices 
of T in state 0; that is: 

P{U) = ¥({v £ V T : £(«) = 0} = U). 
To express P(U) in terms of the transition matrices and 7r, let: 

'0, if v£U- 



1, if v £ V T - U. 



5(U,v)-- 

Then, the Markov property gives: 

(1) P(U)=n s(u , p y P$l )m .y 

(r,s)eA T 

For any subset W of X (the leaf set of T), let pw denote the probability that W 
is precisely the set of leaves of T that are in state 0. This marginal probability is 
given by: 

(2) p w = ]T P(U) 
where: 

(3) s/ w :={UCV T :UnX = W}. 

An example to illustrate this concept is provided in Fig. 1. 

A number of authors have noticed that certain inequalities hold for quadratic 
functions of the p\y values. For example, for any x,y £ X with x ^ y, it is well 
known that: 

P{x} -P{y} < P{x,y} -M- 
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Figure 1. In this example, if W = {a,c}, then pw = 

_ p(P: a ) p(P> b ) p(P> s ) p(«>c) p(s,d) , p (p,a) p (p,6) p (p,s) p(s,c) p (s,d) , 
7r 0- t 00 Mil MOT MIO Mil "r^O^oO Mil Mil MO Ml ^ 
_ p(P, a ) p(P> b ) p(P. s ) p( s . c ) p(M) I _ p(P, a ) p(P, b ) p(P.s) p(«.c) p(M) 
"1-MO Ml 10 M)0 Mil "l - 1 10 r U r ll 10 Ml 



Moreover, in the following inequality was described: for subsets {x, y} and 
{x, z} of X where x,y, z are distinct, we have 

P{:r,y}P{x,z} < P{x,y,z}P{x}- 

The following proposition shows that these are special cases of a much more general 
inequality. 

Proposition 2.1. For any 2-state Markov process on a tree with leaf set X , and 
any two subsets Y, Z of X, we have: 

Py -Pz < Pyuz ■ Pmz- 



Proof. Let A, B be arbitrary subsets of Vt- We first establish the following: 
(4) P(A) -P(B) < P(AUB) -P(AnB). 

Applying Eqn. ^ to U G {A, B, A U B, A n B}, the product P(A) ■ P{B) and 
the product P(A U B) ■ P(A f~l P) can each be written as a product of two entries 
of 7r multiplied by a product over the arcs (r, s) of T of two entries of p( r > s \ 
Moreover, regardless of where r and s lie in relation to the sets A, B, the product 
of the two 7r terms agree in P(A) ■ P(B) and P(A U B) • P(A n B) (i.e., we have 
7i"5(A, P )7T5(s,p) = 7r5( J 4uB,p)7r«5(Ans,p)), while the product of the two P^ terms 
agree in P(A) ■ P(B) and P(A U B) ■ P(A n B), except for the cases in which either 
(i) r E A — B and s E B — A, or (ii) r G B — A and s G A — B. However, in both 
case (i) and (ii), the product Pqi Pio'^ appears in the term for P(A) ■ P(B) while 
Poq S ^Pii' s ^ appears in the term for P(A U B) ■ P(A n B), and the former term is 
less or equal to the second since 

p( r , s ) p(»% s ) _ p( r >s) p( r , s ) _ J„f p(r,s) 
MIO Ml -'Ol 10 — ueu_r 

and det p( r ' s ) > by assumption. Consequently, all the terms in P{A) ■ P(B) are 
either less than or equal to (in cases (i) and (ii)), or equal to (in all remaining cases) 
the corresponding terms in P(A U B) ■ P(A n B). This establishes (U]). 

We now invoke a classical result of Ahlsewede and Daykin (1978) [I], sometimes 
called the 'four functions theorem'. A particular form of this theorem that suffices 
for our purposes is the following (we follow [2j). Suppose we have a finite set S and 
a function a that assigns a non-negative real number to each subset of 5*. Suppose 
that a satisfies the property that for all subsets A, B of S 

a{A)a(B) < a{A U B)a(A n B). 
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For a collection ^ of subsets of S, let a{ c <o) :— ^ceV Q (C)- Then for any two 
collection of subsets of S 7 £/ and 2§, say, we have: 

(5) a(£/)a(2§) < a{s/ V 2§)a{s/ A 23), 
where 

V 2$ := {E C 5 : £ = A U B : A 6 £ e ^}, and 
^AJ:= {£C5:B = inB:4ei/,5ef}. 
We will apply this to our problem by taking S = Vt, a(U) = P(U) and noting that 
a satisfies the required hypothesis by (HJ. Recall the definition of in © and 
note that: 

s^y V srfz = ^yuz, and s^y A s&z = ^Yr\Z- 
Thus taking = srfy and 2% = srfz in © we deduce that: 

ol{s^y)ol{s^z) < a(s^Yuz)a{^Yr\z)- 

The proposition now follows by observing that pw = a(s^w) for all subsets W of 
X, in particular the subsets Y, Z, Y U Z and Y H Z. 

□ 

3. Applications in phylogenetics 

We now describe somee applications of Proposition ^. II 

3.1. Expected future phylogenetic diversity. We first show how Proposition ^. 11 
together with another inequality, provides a general inequality concerning the loss 
of expected future biodiversity under species extinction models. 

Suppose that T is a rooted tree with leaf set X, and with each arc e = (it, v) of T 
there is an associated length A e . Given a subset Y of X, the phylogenetic diversity 
(PD) of Y, denoted ipy, is the sum of the lengths of the edges of the minimal subtree 
of T connecting the root and the leaves in Y. Under various possible interpretations 
of the A values, PD has been widely used as a measure for quantifying present and 
expected future biodiversity [H \5\ [TO] . 

For each species x G X let E x denote the event that species x is extinct at some 
future time t. Then the expected phylogenetic diversity of the set of species that 
are extant at time t, referred to as expected future PD and denoted E[y>], is given 
by: 

(6) e[<p\= A e -(i-p( p E x ))=<p x - e Ae-p(n^). 

e=(u,v)£A T xeC v e=(u,v)£A T x£C v 

where C v denotes the subset of X that is separated from the root by v. A simple 
model, referred to as the generalized field of bullets model (g-FOB) in [5J (gener- 
alizing an earlier model from [10] ) . assumes that the events E x are independent. 
Then, if we let p x = P(E X ), the value of F(f] xeCv E x ) in © (the probability of the 
extinction of all the species descended from v) is given by: 

(7) p( n e *) = n p- 

xec v xec v 
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An example to illustrate this concept is provided in Fig. 2. 




Figure 2. If the species indicated by * become extinct, then the 
remaining phylogenetic diversity is the sum of the lengths of the 
bold edges indicated. 

The assumption that the events E x are independent is likely to be unrealistic 
in most settings (see, for example, [H [13]). For example, species that are 'close 
together' in T are more likely to share attributes that may put them at risk in a 
hostile future environment. 

To take a simple but topical scenario, consider extinction risk due to climate 
change. Suppose that the extinction risk of each species in X is partially influenced 
by some associated binary state (0 or 1) where state confers an elevated risk of 
extinction under climate change. We suppose that these states are not known in 
advance for the species in X, and that this state has evolved under some Markovian 
model on t. Once the states are determined at the leaves, then extinction proceeds 
according to the g-FOB model, where species x is extinct at time t with probability 
p x if it is in state i E {0, 1}. We call this a state-based field of bullets model (s-FOB). 
Note that this includes the g-FOB model as a special case where p x — p x for all 
x. Moreover, once we condition on the state for each leaf, an s-FOB model is just 
a g-FOB model with modified extinction probabilities, but we are assuming that 
these states are unknown (in line with the uncertainty over what features may be 
helpful for an organism in a future climate). 

With any s-FOB model we also have an associated g-FOB model in which the 
extinction probability of each species x is the same as in the s-FOB model. That 
is, in the g-FOB model we set: 

(8) Px =p° x F(!;(x) = 0)+pmt(x) = l), 

where £ describes the Markov process for the binary character. A natural question 
arises: how does the future expected PD of an s-FOB model compare with that of 
its associated g-FOB model? The following result provides a general inequality. 

Theorem 3.1. Consider a fixed tree with branch lengths and leaf set X. Consider 
an s-FOB model, in which state 1 is advantageous for each species, i.e., p\ < p x 
for all x E X . Then the expected future PD of this model is less or equal to the 
expected future PD of the associated g-FOB model. 
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Proof. In view of |6j) and ([7J, it suffices to show that: 

(9) n^^^n 

where p x is defined by Eqn. ([8]) . 

For each subset W of C v let pw denote the probability that the set of elements 
of C v in state is precisely W . Then: 

f ( n e *) = e ^ n p°* n 

xec v wcc v xew xec v -w 



Thus, if we let: 



then: 



Moreover: 



y ' \pl, ilxeC v -W. 

( n e *) = e n f^ w )- 

xec v wcc v xec v 



V x = pirnw = o) + t&(m = i) = E pwuw), 

wcc v 

where the second equality arises by considering in the summation those W contain- 
ing x and those not containing x. Consequently, (|9|) is equivalent to the requirement 
that: 

(io) n f e pwfx{w) \ < pw n f*( w )- 

xec v \wgCv I wcc v xec v 



The proof of (|10|l involves combining Proposition 12.11 with the FKG inequality of 
Fortuin, Kasteleyn and Ginibre (1971) [7], a particular (and multivariate) form of 
which we now recall. 

Given a finite set S, suppose that fx, fa, . . . , f n are functions from the power set 
of S into the non-negative real numbers, and that these satisfy the condition: 

(11) A C B MA) < MB). 

Furthermore, suppose that \x is a probability measure on the subsets of S which 
satisfies the log-supermodularity condition: 

(12) fi(A)fi(B) <n(AuB)n(AnB). 
Then: 

n / \ n 

(13) n e vWiW < e p( a ) n 

i=X \ A ) A »=1 

where the summations are over all subsets of S. 

We apply this form of the FGK inequality by taking S = {1, ...,n} = C v , 
n{W) = pw, and f x as defined above. Then f x satisfies (JTTJ) by the hypothesis that 
P x < P x f° r a U x , while fi satisfies (fT2|) by Proposition 2.1. Then inequality (fl3|) 
provides the required inequality (flO|) . This completes the proof. 
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□ 



3.2. Combinatorics of parsimony. We now describe a further application of 
Proposition 12.11 by deriving a purely combinatorial result concerning a measure 
('parsimony score') that underlies certain approaches for inferring evolutionary his- 
tory (see, for example, [6]). 

Given a function / : X — > {0, 1}, recall that the parsimony score of / on T, 
denoted £(/, T), is the minimum number of edges that have different states assigned 
to their endpoints, across all extensions F : Vr — > {0, 1} of /. For example, for the 
tree T in Fig. 1, and the function / defined by f(a) = /(c) = 0, f(b) = f(d) = 1, 
we have l(f,T) = 2; for this example, there are two minimal extensions F of / 
corresponding to F(p) = F{s) — i for i 6 {0, 1} (for further details concerning the 
mathematical properties of parsimony score, see [12]). For W C X, let fw denote 
the function that assigns state to the elements of W, and assigns state 1 to the 
elements of X — W. The following result states that the parsimony score function 
for a given tree is submodular. 

Theorem 3.2. For any tree T with leaf set X and subsets Y, Z , of X we have: 
l(Jr, T) + l(fz,T) > 1(Jyuz,T) + l(f Y nz, T). 



Proof. Consider the 2-state Markov random field on T with ttq = n\ = 0.5, and set 
each transition matrix p( r < s ) to be the symmetric 2x2 matrix with off-diagonal 
entry e > 0. Then, for any W C X a straightforward calculation shows that: 

(14) Pw = C w e l( -^' T Hl + o(e)), 

for a constant Cw that depends only on W and T and not e (specifically, CV is 
the number of minimal extensions of fyy to the vertices of T multiplied by ^). Now 
Proposition ^. 11 expressed using logarithms, states that: 

(15) - log(py) - \0g{pz) > ~ logijpYUz) - log(py n z). 

Applying (| 14[) (and noting that log(l + o(e)) = o(e)), the left-hand side of (fT5|) is: 
(l(Jy,T) + l(f z , T)) log f - )- log(C Y Cz) + o(e) 



while the right-hand side of (fT5|) is: 

(l(fYuz,T) + l(f YnZ , T)) log (- ) ~ \og(C Y uzC Y nz) + o(e). 



Theorem 13.21 now follows by letting e tend to zero. □ 



4. Concluding Remarks 

(i) It is possible to establish Theorem 13.21 using a purely combinatorial proof, 
by first invoking Menger's theorem from graph theory to handle the case 
where Y and Z are disjoint, and then using a complementation argument 
for the case Y U Z = X. The remaining case where X — (Y U Z) is non- 
empty can then be established by a somewhat detailed argument that uses 
induction on \X\. 
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(ii) Proposition 12.11 provides a collection of polynomial inequalities on the pw 
values, which have recently been studied for a particular class of Markov 
2 state models in [9 . These polynomial inequalities complement the much- 
studied 'phylogenetic invariants' (polynomial identities in the pw values), 
which hold under various restrictions on the Markov model. Combining 
these phylogenetic invariants with the polynomial inequalities provides a 
way of characterizing when a probability distribution arises on a tree under 
a Markov process (either with or without restrictions). For the 2-state 
Markov process on a tree with 3 leaves, this was solved in [TT]. 
(hi) It may be of interest to derive an extension of Proposition 1 2 . 1 1 that applies 
when the state space has a size greater than 2. 
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